A Rank Aggregation Algorithm for Ensemble of Multiple Feature Selection Techniques in Credit Risk Evaluation

نویسنده

  • Shashi Dahiya
چکیده

In credit risk evaluation the accuracy of a classifier is very significant for classifying the high-risk loan applicants correctly. Feature selection is one way of improving the accuracy of a classifier. It provides the classifier with important and relevant features for model development. This study uses the ensemble of multiple feature ranking techniques for feature selection of credit data. It uses five individual rank based feature selection methods. It proposes a novel rank aggregation algorithm for combining the ranks of the individual feature selection methods of the ensemble. This algorithm uses the rank order along with the rank score of the features in the ranked list of each feature selection method for rank aggregation. The ensemble of multiple feature selection techniques uses the novel rank aggregation algorithm and selects the relevant features using the 80%, 60%, 40% and 20% thresholds from the top of the aggregated ranked list for building the C4.5, MLP, C4.5 based Bagging and MLP based Bagging models. It was observed that the performance of models using the ensemble of multiple feature selection techniques is better than the performance of 5 individual rank based feature selection methods. The average performance of all the models was observed as best for the ensemble of feature selection techniques at 60% threshold. Also, the bagging based models outperformed the individual models most significantly for the 60% threshold. This increase in performance is more significant from the fact that the number of features were reduced by 40% for building the highest performing models. This reduces the data dimensions and hence the overall data size phenomenally for model building. The use of the ensemble of feature selection techniques using the novel aggregation algorithm provided more accurate models which are simpler, faster and easy to interpret. Keywords—Classification; Credit Risk; Feature Selection; Ensemble; Rank Aggregation; Bagging

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

Classification Performance of Rank Aggregation Techniques for Ensemble Gene Selection

A very promising tool for data mining and bioinformatics is ensemble gene (feature) selection. Ensemble feature selection is the process of performing multiple runs of feature selection and then aggregating the results into a final ranked list. However, a central question of ensemble feature selection is how to aggregate the individual results into a single ranked feature list. There are a numb...

متن کامل

Comprehensive causal analysis of occupational accidents’ severity in the chemical industries; A field study based on feature selection and multiple linear regression techniques

Introduction: The causal analysis of occupational accidents’ severity in the chemical industries may improve safety design programs in these industries. This comprehensive study was implemented to analyze the factors affecting occupational accidents’ severity in the chemical industries. Methods and Materials: An analytical study was conducted in 22 chemical industries during 2016-2017. The stu...

متن کامل

Correlation based Feature Selection using Rank aggregation for an Improved Prediction of Potentially Preventable Events

This paper presents a methodology for developing a novel feature selection model that will help in a more accurate and robust prediction of patients with the risk of Potentially Preventable Events (PPEs). PPEs are admissions, readmissions, complications and emergency department visits that could have been avoided if the patient had been given the appropriate interventions. Various clinical fact...

متن کامل

Development of Discriminant Analysis and Majority-Voting Based Credit Risk Assessment Classifier

This article presents a research on a method for credit risk evaluation combining expert majority-based ensemble voting scheme together with discriminant analysis as basis for expert formation and popular machine learning techniques for classification, such as decision trees, rulebased inducers and neural networks. Both single expert and multiple expert evaluations were applied as basis for for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016